Language-independent Techniques for Automated Text Summarization
نویسنده
چکیده
Text summarization is the process of distilling the most important information from source/sources to produce an abridged version for a particular user/users and task/tasks. Automatically generated summaries can significantly reduce the information overload on intelligence analysts in their daily work. Moreover, automated text summarization can be utilized for automated classification and filtering of text documents, information search over the Internet, content recommendation systems, online social networks, etc. The increasing trend of cross-border globalization accompanied by the growing multi-linguality of the Internet requires text summarization techniques to work equally well on multiple languages. However, only some of the automated summarization methods proposed in the literature can be defined as “multi-lingual" or “language-independent," as they are not based on any morphological analysis of the summarized text. In this chapter, we present a novel approach called MUSE (MUltilingual Sentence Extractor) to “language-independent" extractive summarization, which represents the summary as a collection of the most informative fragments of the summarized document without any language-specific text analysis. We use a Genetic Algorithm to find the best linear combination of 31 sentence scoring metrics based on vector and graph representations of text documents. Our summarization methodology is evaluated on two monolingual corpora of English and Hebrew documents, and, in addition, on a bilingual collection of English and Hebrew documents. The results are compared to 15 statistical sentence scoring methods for extractive single-document summarization found in the literature and to several stateof-the-art summarization tools. These bilingual experiments show that the MUSE methodology significantly outperforms the existing approaches and tools in both languages.
منابع مشابه
Systematic literature review of fuzzy logic based text summarization
Information Overloadrq is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...
متن کاملTowards multi-lingual summarization: A comparative analysis of sentence extraction methods on English and Hebrew corpora
The trend toward the growing multilinguality of the Internet requires text summarization techniques that work equally well in multiple languages. Only some of the automated summarization methods proposed in the literature, however, can be defined as “languageindependent”, as they are not based on any morphological analysis of the summarized text. In this paper, we perform an in-depth comparativ...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملA System for Generating Cloze Test Items from Russian-Language Text
This paper studies the problem of automated educational test generation. We describe a procedure for generating cloze test items from Russian-language text, which consists of three steps: sentence splitting, sentence filtering, and question generation. The sentence filtering issue is discussed as an application of automatic summarization techniques. We describe a simple experimental system whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010